Using Commonsense Knowledge to Automatically Create (Noisy) Training Examples from Text
نویسندگان
چکیده
One of the challenges to information extraction is the requirement of human annotated examples. Current successful approaches alleviate this problem by employing some form of distant supervision i.e., look into knowledge bases such as Freebase as a source of supervision to create more examples. While this is perfectly reasonable, most distant supervision methods rely on a hand coded background knowledge that explicitly looks for patterns in text. In this work, we take a different approach – we create weakly supervised examples for relations by using commonsense knowledge. The key innovation is that this commonsense knowledge is completely independent of the natural language text. This helps when learning the full model for information extraction as against simply learning the parameters of a known CRF or MLN. We demonstrate on two domains that this form of weak supervision yields superior results when learning structure compared to simply using the gold standard labels.
منابع مشابه
Using Scripts to help in Biomedical Text Interpretation
Introduction This short note speculates on the use of world knowledge to help interpret a short paragraph of biomedical text about transcytosis (transport across a cell). The ultimate goal is to create a simple representation of the transcytosis process from the text (either automatically or semi-automatically). The challenges are formidable, as the process involves several steps that are often...
متن کاملOnline Inference-Rule Learning from Natural-Language Extractions
In this paper, we consider the problem of learning commonsense knowledge in the form of first-order rules from incomplete and noisy natural-language extractions produced by an off-the-shelf information extraction (IE) system. Much of the information conveyed in text must be inferred from what is explicitly stated since easily inferable facts are rarely mentioned. The proposed rule learner accou...
متن کاملExtracting Glosses to Disambiguate Word Senses
Like most natural language disambiguation tasks, word sense disambiguation (WSD) requires world knowledge for accurate predictions. Several proxies for this knowledge have been investigated, including labeled corpora, user-contributed knowledge, and machine readable dictionaries, but each of these proxies requires significant manual effort to create, and they do not cover all of the ambiguous t...
متن کاملCommonsense for Making Sense of Data
In my doctoral research, I address the problem of automatically acquiring commonsense knowledge from text corpora and also from data-sets containing visuals (images, videos) along with textual descriptions. I also aim to exploit the acquired commonsense knowledge for domain-specific and domain-independent applications such as fine-grained search, retrieval and prediction, data integration and a...
متن کاملCommonsense from the Web: Relation Properties
When general purpose software agents fail, it’s often because they’re brittle and need more background commonsense knowledge. In this paper we present relation properties as a valuable type of commonsense knowledge that can be automatically inferred at scale by reading the Web. People base many commonsense inferences on their knowledge of relation properties such as functionality, transitivity,...
متن کامل